Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming

This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms a...

متن کامل

The value iteration algorithm is not strongly polynomial for discounted dynamic programming

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iter...

متن کامل

Policy Iteration Algorithms for DEC-POMDPs with Discounted Rewards

Over the past seven years, researchers have been trying to find algorithms for the decentralized control of multiple agent under uncertainty. Unfortunately, most of the standard methods are unable to scale to real-world-size domains. In this paper, we come up with promising new theoretical insights to build scalable algorithms with provable error bounds. In the light of the new theoretical insi...

متن کامل

Smooth Value and Policy Functions for Discounted Dynamic Programming

We consider a discounted dynamic program in which the spaces of states and actions are smooth (in a sense that is suitable for the problem at hand) manifolds. We give conditions that insure that the optimal policy and the value function are smooth functions of the state when the discount factor is small. In addition, these functions vary in a Lipschitz manner as the reward function-discount fac...

متن کامل

Constrained Discounted Dynamic Programming

This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semi-continuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a fe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Operations Research Letters

سال: 2014

ISSN: 0167-6377

DOI: 10.1016/j.orl.2014.07.006